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Ontology Matching (OM) which targets finding a set 
of alignments across two ontologies, is a key enabler 
for the success of Semantic Web. In this paper, we 
introduce a new perspective on this problem. By in- 
terpreting ontologies as Typed Graphs embedded in 
a Metric Space, coincidence of the structures of the 
two ontologies is formulated. Having such a formu- 
lation, we define a mechanism to score mappings. This 
scoring can then be used to extract a good alignment 
among a number of candidates. To do this, this paper 
introduces three approaches: The first one, straight- 
forward and capable of finding the optimum align- 
ment, investigates all possible alignments, but its run- 
time complexity limits its use to small ontologies only. 
To overcome this shortcoming, we introduce a second 
solution as well which employs a Genetic Algorithm 
(GA) and shows a good effectiveness for some certain 
test collections. Based on approximative approaches, a 
third solution is also provided which, for the same pur- 
pose, measures random walks in each ontology versus 
the other. 



Keywords: coincidence -based, ontology matching, met- 
ric spaces, genetic algorithms, graph theory 



1. Introduction 

In this section, first, an outline of the problem will be 
explained. A discussion on the terminology of this paper 
is given next. Afterward, a survey on the related works 
follows. This section is closed then by an outline of this 
work and its structure. 

1.1. Outline of Problem 

Semantic Web is said to be the next generation of Web 
where information is given a well-defined semantics in 
order to enable computer agents to use them in the same 
way that human beings do. Unlike traditional knowledge- 
based systems, as like as the web itself, Semantic Web 
is by design distributed and heterogeneous. Ontologies 
are aimed to play a central role in making this hetero- 
geneity feasible while simultaneously making it possible 
to reason about this distributed knowledge. However, in 
many real cases, since they are created by diverse parties 



distant from each other (and possibly with a very little 
shared knowledge), the ontologies themselves also suffer 
from heterogeneity. The need thus arises for a mechanism 
to tackle this heterogeneity to enable computer agents to 
leverage the semantic interrelationships among the enti- 
ties of ontologies during reasoning processes. The set of 
mechanisms for dealing with this is usually referred to as 
Ontology Alignment (OA), where Ref. [1] defines it as: 

... given two ontologies which describe each 
a set of discrete entities (which can be classes, 
properties, rules, predicates, etc.), find the re- 
lationships (e.g., equivalence or subsumption) 
holding between these entities. 

OM, meanwhile seems to be a subtask of OA. A lot 
of current OM methods calculate inter-conceptual sim- 
ilarities using some predefined measures (phase 1), and 
via the interpretation of results, put forward some possi- 
ble set of semantic interrelationships among the entities 
(phase 2). Given 0\ and O2 as the ontologies we are to 
align, and defining O = 0\ U O2, typically a dissimilarity 
(or distance) measure is formally defined as follows [2]: 

A dissimilarity 5:OxO^Misa mapping from a 
pair of entity to a real number - expressing the distance 
between two objects such that: 

Vx,y G 0, 5(x,y) > (positiveness) 

Vx G O, S(x,x) = (minimality) 

Vx,y G 0, 8(x,y) = 5(y,x) (symmetry) 

It is customary to have dis-similarity in the scale of 
to 1 to define similarity as "1 — dissimilarity". 

The many different similarity measures defined in the 
literature are generally categorized into two groups: lex- 
ical and structural. Lexical measures are concerned 
about lexicographical similarity, while structural mea- 
sures leverage hierarchical relationships among concepts 
(e.g., number of common children, common parents, etc.). 

It is common also to first define a set of similarity mea- 
sures - lexical or structural - then apply them consecu- 
tively like a compound similarity measure (Fig. 1). The 
application of this set of (compound) similarities yields 
an initial guess. The final decision is made afterward in 
another phase. In this phase, the ultimate set of satisfac- 
tory correspondences between the ontologies is defined. 
In this view, mapping extraction is a process to find the 
best mapping across ontologies. 
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Fig. 1. A simplified alignment framework. 



1.2. Terminology 

It is worth mentioning that we suspect, without this 
subsection, some precise readers may feel confused about 
how we use the terms OA and OM. This clarifier subsec- 
tion, is contrived for removal of such confusions. 

Firstly, to our knowledge, there exists no consensus 
on a precise definition of these two terms. We therefore 
adopt the following definitions, which appear to be well- 
respected in the literature: 

Reference [3] defines the term Mapping as: 

a formal expression that states the semantic re- 
lation between two entities belonging to differ- 
ent ontologies. When this relation is oriented, 
this corresponds to a restriction of the usual 
mathematical meaning of mapping: a function 
(whose domain is a singleton 1 ). 

And, then, defines OA accordingly as: 

a set of correspondences between two or more 
(in case of multi-alignment) ontologies (by 
analogy with DNA sequence alignment). These 
correspondences are expressed as mappings. 

The OA definition quoted in Section 1 from [1] appears 
to agree with this. Additionally, OM is defined in [4] as: 

the problem of finding the semantic mappings 
between two given ontologies. 

(Note that Ref. [4] is speaking about "finding the se- 
mantic mappings" rather than a set of semantic ones. This 
suggests that Ref. [4] assumes the existence of some par- 
ticular mapping that is superior to any alternative.) 

Putting all these definitions together, our understanding 
is that OM is the act of finding some proper alignments 
each of which, in return, is a set of mappings. This is how 
hereafter we apply our terminology in this paper, which 
also agrees with [5]. Note, however, that we do not assert 
the existence of any consensus on the working definition 
chosen for this paper. Alongside, the reader might also 
note that the CFP of Eswc 2007 for example, includes 
[6]: 

Topics of interest to the conference include (but 
are not restricted to): 



• Ontology Alignment (mapping, matching, 
merging, mediation and reconciliation) 



which implies that OA is a set of tasks, one of which is 
OM. 

A final remark which, theoretically speaking, is much 
more important is that the understanding of the Seman- 
tic Web community of the term "matching" apparently 
clashes with that of Mathematics. Given that our work 
- like a bewildering number of other related ones - is 
greatly engaged with Mathematics, let those readers com- 
ing from a mathematical background be warned that they 
will probably become confused if they adhere solely to 
their prior terminology. 

1.3. Related Works 

Current researches in ontology mapping and its appli- 
cations entails a large number of fields ranging from ma- 
chine learning, concept lattices, and formal theories to 
heuristics, and linguistics. Similar attempts have also 
been done to match graphs, and trees [7,8], database 
schema [9] and even in clustering compound objects with 
a machine learning technique [10]. Yet, works on ontolo- 
gies and mapping extractions are not so many [3]. 

Although there are works which choose to address both 
simultaneously, the works related to that of ours generally 
choose to work on either: 

• alignment weighting and similarity measures; This 
group of works mainly focuses on the similarity mea- 
sures (across the concepts of the two ontologies) and 
weight functions. The purpose is to evaluate a given 
alignment. Or, 

• mapping extractions, in which the research tries to 
address extracting alignments and proposing meth- 
ods to find a (more proper) alignment. 

We will have a quick review on each category in the 
two following subsections. 

1.3.1. Alignment Weighting and Similarity Measures 

Some standards of metrics are acknowledged and de- 
fined as in the CommonKADS methodology [11], or On- 
to Web EU thematic network [12], which are partly en- 
dorsed by recognized bodies. Also, there have been some 
works on finding similarities of entities in two ontologies 
based on their structural standings: Ref. [13] computes 
the dissimilarity of elements in a hierarchy based on their 
distance from closest common parent. The Upward Co- 
topic distance is introduced by [14] where they found dis- 
similarity of entities in hierarchies of ontologies. The key 
difference between those works and the current one is that 
they consider the mere structural features of ontologies. 

Reference [15] introduces a measure to calculate simi- 
larity of WordNet 2 concepts, i.e. a single hierarchy. The 
similarity is computed based on the closest common par- 
ent and distance of the two entities from the root. This 
work gets closer to that of ours but it is very immature in 
that it very simply presumes a hierarchical structure for 



1. Unfortunately, mathematically speaking, this is incorrect because no 
such constraint exists on functions in mathematics. 
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every ontology. The authors of this paper understand that 
this is an engineering assumption. Yet, we believe that 
this is far away from correct in reality and that our work 
has no such assumptions. 

On the other hand, some methods tend toward a trade- 
off between different features such as efficiency and qual- 
ity, as in QOM [16], and some have used approaches to in- 
tegrate various similarity methods [17]. This work, unlike 
them, offers a manifesto of its desired properties. Then, it 
examines a few solutions which adhere to that. 

Besides, compound metrics get use of simple measures 
by combining them, and hoping to improve the result of 
the mapping between two ontologies. One approach has 
been to define each measure as a dimension to find the 
Minkowski distance of two objects [18]. As introduced in 
[18], another approach for this problem has been weighted 
average of features in which weight can even be defined 
by a machine learning technique. Glue [19] builds the 
similarity matrix by a machine learning approach too. In 
APFEL [20] weights for each feature are calculated using 
Decision Trees. The user only has to provide some on- 
tologies with known correct alignments. The learned de- 
cision tree is then used for aggregation and interpretation 
of the similarities. Ref. [21] introduces a new method for 
compound measure creation without any need to the map- 
ping extraction phase. It estimates the similarity among 
entities of two ontologies based on existing transitive re- 
lationships across the ontologies. 

1.3.2. Mapping Extraction 

A method for mapping extraction is proposed by [22] 
which examines linguistical features to compare two on- 
tologies on the basis of an IS-A relationship. Staab et 
al. [23] have also focused on structural and taxonomic 
comparison of two trees. To extract an alignment, dissim- 
ilarity of each two concept is calculated based on their su- 
perclasses and subclasses. Stumme et al. [24] uses shared 
instances of two ontologies that are to be mapped, how- 
ever this work ignores the properties of classes. Again the 
preference of our work over these ones is that it is not bi- 
ased towards any special way in which the ontology (as a 
graph) is shaped or how the labels are used. 

Zhdanova et al. [25] expand the notion of OM to a 
community-driven approach to enable web communities 
to establish and reuse OM to achieve an adequate and 
timely domain representation. Our work in contrast is not 
targeting any special domain. 

In [26], to extract a reasonable alignment, applicability 
of the solutions for the Stable Marriage [27] problem is 
studied. There are some other approaches, as an example, 
a machine learning approach to the problem is discussed 
in [4], and Ref. [28] describe a probabilistic-based model. 

Johnson et al. [29] model inter-ontology relationship 
detection as an information retrieval task, where relation- 
ship is defined as any direct or indirect association be- 
tween two ontological concepts. 

Wang et al. [30] presents a specific formalization and 
algorithm presented for local interpretation of shared rep- 
resentations to build global semantic coherence for the 



distributed actions of individual agents, known as Mutual 
Online Ontology Alignment. 

LOM as described in [31] is a semi-automatic lexicon- 
based ontology-mapping tool that supports a human map- 
ping engineer with a first-cut comparison of ontologi- 
cal terms between the ontologies to be mapped (based 
on their lexical similarity and simple heuristic methods). 
These works, unlike that of ours, are mostly careless about 
the (overall) structure of the ontology. 

1.4. This Work 

This paper introduces a new factor called coincidence 
that combines different ideas from different realms of 
science and engineering, including Ontology Matching, 
Graph Homeomorphism, Metric Spaces, and Domain 
Theory. In simple words, it targets scoring the mappings 
based on how graphically better the coincidence of on- 
tologies appears for different mappings. Therefore, it can 
be used in phase 2 of an alignment framework. This work 
enumerates the properties which a measure with such a 
quality should have, and offers one such measure itself. 
Then, to demonstrate this use in action, it gives three ap- 
proaches for mapping extraction based on this measure. 

In the simplest form, we generate all possible align- 
ments and score each based on the measure, and finally, 
select the ones having maximum scores (global maxi- 
mals). However, this method suffers from exponential 
runtime and, therefore, has a limited application (to small 
ontologies). For attaining a more docile solution for large 
ontologies and generating a nearly optimal solution, we 
developed a Genetic-Based algorithm which applies the 
coincidence measure during generation of new individu- 
als such that new generations have better coincidence. We 
also developed an approximative approach which does not 
insist on generating all the alignments first and then esti- 
mating their scores. Instead, it attempts to estimate the 
mapping having the best coincidence score. 

To introduce the coincidence measure, the basic math- 
ematical background is explained in Section 2.1 and the 
corresponding problem is defined formally in Section 2.2. 
In Section 3, we introduce the measure in Section 3.1 by 
discussing the intuitions that the solution is based upon. 
Translation of the intuitions into different possible graph 
structures comes in section 3.2. The formulation of a 
scoring mechanism is explored in Section 3.3, in addition 
to some commentary on the mechanism in Section 3.4. 
Moreover, in Section 4, we show how to use the mecha- 
nism in three different ways by first having a discussion 
on how to reduce complexity for OWL ontologies in Sec- 
tion 4.1. Explanation of a naive approach is in Section 
4.2, a Genetic Based approach in Section 4.3, and an ap- 
proximative one in 4.4. Finally, we present a conclusion 
in Section 5. 
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2. Specification of the Problem 

2.1. Mathematical Background 

In this section, we define necessary mathematical con- 
cepts which are used throughout the paper. The first one 
is the notion of a Metric Space for which we refer to what 
defined in [32]: 

A seiX, whose elements we shall call points, is 
said to be a metric space if with any two points 
p and q in X there is associated a real number 
d(p,q), called the distance from p to q, such 
that: 
d(p,q) > if p ^ q;d(p,p) = 0; [positiveness] 
d(p,q) = d(q,p); [symmetry] 

d(p,q) < d(p,r)+d(r,q),yre X. 

[triangular inequality] 

Any function with these three properties is 
called a distance function, or metric. 

Another piece of theory which is of help is the notion 
of Typed Graphs 3 . In general, we call a graph G typed 
if each edge of it has a type. In other words, let us for- 
mally define G(V,E, T) a typed graph if E : V x V -> T, 
where T is a set of predefined types. An edge e of type 
t is written e : t . A homeomorphism from a typed graph 
G(V,E,T) to a typed graph G'(V',E',T) is a one-to-one 
correspondence m between V and V'. We will call an edge 
e(a,b) : t G E preserved under m or P(e,m) iff there is an 
edge e' (m(a) ,m(b)) :t E.E'. If both a and b get mapped to 
some vertex in V', yet there is no edge of type t between 
m(a) and m(b) - typelessly preserved, we write TP(e,m). 
We will call a typed graph G(V,E, T) vertices of which are 
points in (X,d) embedded in X, and write G(V,E,T,X,d). 

Reference [33] defines a Partially Ordered Set as fol- 
lows: 

A set P with a binary relation E is called a par- 
tially ordered set or poset if the following holds 
for all x,y,z£ P: 

1 . x E x (Reflexivity) 

2. xQyAyQz^ xQz (Transitivity) 

3. xQyAyQx^>x = y (Antisymmetry) 

We add that E above is called a partial order. 

As the last definition, let us call the set of all the di- 
rected paths stemming from a vertex v the set of v-stems. 
A path in this set will, analogously, be called a v-stem. 
It should be noted that an implication of this definition is 
that v should be in a directed graph basically to have a 
stem. 

2.2. Theoretical Specification of Scoring 

Assume that we are given two Ontologies as well as 
distance values for each pair of concepts across the on- 
tologies. Such distances may have been obtained by ap- 
plication of a (lexical, structural, or compound) measure. 



The goal is to score mappings (and thereafter alignments) 
so that one can select a best or near-the-best alignment 
among all the available possibilities. We formulate this 
problem as follows: 

Input: A pair of ontologies, and a matrix, rows and 
columns of which stand for concepts from one ontology, 
and concepts from the other, respectively. Each cell shows 
the distance between the corresponding concepts. 

From our point of view, this input is interpreted as a pair 
of directed acyclic graphs embedded in a metric space. 
So, naming the input ontologies O and O' , we do not dis- 
tinguish them from G(V,E,T,X,d) and G'(V',E',T,X,d) 
respectively. 

Output: A scoring of possible alignments which can 
be a help for better extraction. From our point of view, 
this is a partial order on the possible homeomorphisms 
between G and G 1 . 

To produce the above output, this paper first enumerates 
a list of rationales for the above partial order, and then 
presents one possible candidate for that. This leads to a 
straightforward yet non-effective solution. We will then 
discuss possible axes along which one can tune that and 
add two related solutions that overcome the complexity of 
the first one. 

We should mention that - although some experts may 
consider our work a method for mapping extraction - we 
believe that this part offers a new criteria which helps de- 
ciding better on extraction, as opposed to extraction it- 
self. 



3. The Partial Order 

In this section, we first give an intuition for our method, 
translate that intuition to various graph patterns and finally 
give a precise specification of our scoring mechanism. 

3.1. Intuition 

We forget about OA for a few minutes, and consider 
the following basic-geometry problem: When do we call 
a pair of triangles the same? When they are equal in the 
geometric sense? For example, do we consider the two 
triangles in Fig. 2a the same? We do doubt 4 ! Now, look- 
ing at Fig. 2b; (The solid lines indicate one triangle, the 
dotted ones indicate another, while the vertices of the tri- 
angles coincide.) Up to our understanding, we - human- 
beings - consider these two triangles the same! Now, in- 
troducing the case of Figs. 2c and 2d into the comparison, 
and trying to give a fuzzy interpretation to the concept of 
"being the same" - or coincidence, it should be said that: 
the two triangles in Fig. 2d are more the same than that of 
Fig. 2c. And, the two of Fig. 2b coincide even more. 

Back to the realm of OA, the authors should say that the 
approaches which are concerned merely about the struc- 
ture of ontologies are imprecise in that they fail to distin- 
guish between the two triangles in Fig. 2a. That is to say, 



3. There is no consensus in mathematics on this name. 



4. In mathematical topology, these two triangles are the same in that there 
exists a continuous bijection between them, inverse of which is also con- 
tinuous. 
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Fig. 2. Matching of shapes. 




Fig. 3. Structure is not enough. 



as Fig. 3 depicts, those approaches tend to reduce both G 
and G' to the same graph (G"). This, obviously, is a very 
magnificent loss of information because G and G' will be 
interpreted as similar ontologies while they are describing 
totally different worlds. 

For the next step to understand the notion of coinci- 
dence and their usage in OA, we consider Fig. 4 where 
all the points are considered to be in a metric space. Sup- 
pose that we are about to have an estimate for how much 
the two triangles of part (i), namely ABC and A'B'C', co- 
incide. One may find it trivial that this is a function of 
d(A,A') +d(B,B') +d(C,C'), where d is the metric of 
our metric space. What this means is that we tend au- 
tomatically to choose A to be paired to A 1 , B to B ', and C 
to C' . The reason why this happens is that this way, by 
merely pairing each vertex to its closest counterpart from 
the other triangle, the overall distance of the two triangles 
will be minimized too. That is to say, naturally, human- 
beings do not try to estimate the distance between the two 
triangles by considering d(A,B') +d(B, A') +d(C,C'). 
Because this latter sum will needlessly be more than the 
former. 

Considering the same problem for the triangle and pen- 
tagon in Fig. 4(ii) will not be this trivial. One has to be 
careful about how to pair the vertices up so that the over- 
all sum minimizes. This is the case because each choice 
affects the rest of vertices too. The problem will become 
more sever when one is dealing with complicated shapes 
with large number of vertices. This is where the matter of 
how to pair up the vertices - i.e., mapping and alignment 
- becomes a keynote. One can observe now that different 
alignments can affect the way coincidence of ontologies 
get interpreted. For that, Section 3.2 lists the properties 
that are expected from a good interpretation from the de- 





(ii'i 



Fig. 4. The impact of the correct choice for mapping. 



(i) 




(ii) 




Fig. 5. Coincidence is not only being close. 



gree of coincidence of two graphs. As it turns out, those 
properties are dependent on the mappings and will there- 
fore help us to identify the alignments which helps us to 
have a better understanding over the coincidence of two 
ontologies. 

This is what we are about to inject in the world of OA. 
That is, given that the phase one of OA gives us a mea- 
sure for similarity of concepts across the ontologies, we 
consider this measure as an estimate for the distance be- 
tween each pair of points (i.e., concepts), and suite it for 
estimating the extent to which the two ontologies - as the 
whole graphs - coincide. Alongside, we first offer an es- 
timate for the extent of coincidence between two edges, 
and then accumulate all these as our final estimate for the 
coincidence of the two ontologies. 

It is worth mentioning that it may be tempting to for- 
get about the differences between Metric and Cartesian 
Spaces, and mistakenly think about coincidence as merely 
being close. With that misconception, one might decide 
to define coincidence in terms of a centroid. Regardless of 
the technical difficulties that defining centroid in a Metric 
Space has, we should mention that this approach will not 
describe coincidence. In Fig. 5 for example although the 
centroid of the two triangles in part (i) exactly coincide, 
the two triangles themselves do not. A comparison be- 
tween this part and part (ii), will reveal it that in spite of 
the fact that there is a distance between the centroids of 
the latter pair of triangles, they happen to be more coin- 
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ciding. This observation tells us that coincidence is rather 
a direct function of all the pairwise distances of the nodes 
than a single representative (such as centroid). 

The astute reader may wonder what technical difficul- 
ties might defining centroid for ontologies have. Here is 
an interesting one: First we should mention that a cen- 
troid is usually defined to be the point which has equal 
distance from all the points of a shape. Then, what is the 
interpretation of such a point for ontologies, if any? Fur- 
thermore, assuming that, for every ontology, we can find 
a proper point in our Metric Space with such a property, 
it carries no significantly meaningful information for the 
other ontologies. For example, if O C O' and the major- 
ity of the concepts of O' are far away from O, so will 
be the centroid. In this case, considering O and O' to be 
non-coinciding is an obvious mistake - yet the distance 
between the centroids will be significant. The technical 
difficulties of centroids in Metric Spaces is not limited to 
OA. For instance, one can refer to [34] for an excessive 
list of such difficulties in capturing proximity of webpage 
elements. 




Fig. 6. Properties of metrics. 



3.2. Properties of the Desired Partial Order 

Here will be a set of properties which we believe any 
partial order for our problem should convince, along with 
our reasons for such beliefs. Our proposed partial order 
is in fact a weight function for matchings, so hereafter we 
use weight in place of it. The set of properties are divided 
into 6 categories, based upon preservation of the edge (un- 
der the correspondence), and upon the mutual distance be- 
tween its heads. 

In all categories of Fig. 6, O and O' are the input ontolo- 
gies, a and b will be concepts in O, and, a' and b' concepts 
in O'. The closer a pair of concepts is depicted in figures, 
the closer the concepts are intended to be in the (X,d). 
(That is, the closer a and a' are shown in the figures, the 
smaller d(a,a') is.) We do not force the ontologies to be 
disjoint, so, in each figure, it can be seen that the surface 
of ontologies may overlap. Furthermore, in each figure, 
the arrows show mappings. (That is, the source of arrow 
is intended to be said is mapped to its destination.) And, 
the lines - be it solid or dotted - show the edges of graphs. 
(Solid lines show the edges between a and b, and dotted 
edges show the edges between a' and b' .) 

Category I. Here, a and a' are too close, like b and b' . 
The fact that (a,b) is preserved is of much importance to 
us because it means that the two edges coincide too much. 
So, we want this preserved edge to bring a great weight. 
To justify it, consider the case when a and b are "Animal" 
and "Jaguar" respectively, and a' and b' are "Living Crea- 
ture" and "Tiger". The fact that there is an edge (of type 
redfs:type) between both a and b, a' and b', means very 
much that the two ontologies are perhaps describing the 
same world. 

Category II. In this category, the edge is preserved, but 
only one peer of the edge is close to its image. As an ex- 
ample of such cases, one can consider O be describing 
a Zoo, and O' a Museum. Furthermore, suppose that a 



and b are "Elephant" and "4-legged", and, a' and b' are 
"Mammoth" and "Ancient Creature". An interpretation 
of this is that although O and O' are describing two differ- 
ent worlds, they are perhaps getting coincident "from the 
side of a". Therefore, we would like such cases to get a 
moderate weight, i.e., smaller than the previous case. 

Category III. The third category is the one where an 
edge is not preserved while the relevant concepts are so 
close. Consider, e.g., when O is describing the Glazing 
Technology, while O' is the ontology of a simple glasses 
manufacturing studio. In this respect, a and b could be 
"Glass" and "Frame", and a' and b' the same respectively. 
Of course d(a 7 a') and d(b,b') may both be very small 
here. We consider the non-preservation of edge a negative 
point, but because the vertices coincide, we do not penal- 
ize this matching that much. This is logical because the 
closeness of (a 7 a') and (b 7 b') means that the edge (a 1 ,b') 
is perhaps mistakenly missed. 

Category IV. Next, we come to the category where an 
edge is not preserved, while only one side of the edge is 
too close to what it is mapped to. A mapping which does 
this is perhaps trying to make a mistake, but not as big 
as category VI. So, we will not penalize it that much. As 
an example of such a case, we consider this case: O is 
describing a glasses manufacturing studio, and O' is a car 
factory. Assume that a is "Glasses" and a 1 is "Glass", b 
could be "Frame", while V is "Chassis". Like category III 
which is somehow dual of category I, this category can be 
considered dual of category II. 

Category V and VI. A preserved edge certainly in- 
creases the likelihood of preservation of shape for the two 
entire graphs. However, if neither endpoint of the edges 
are close to what they are mapped to, the two edges do not 
coincide that much. This does not to be a great success, 
therefore, because it does not greatly help the coincidence 
of two ontologies. In other words, although the preserva- 
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Table 1. The six categories and their treatments. 



Proximity => 
JJType of Edge 


Both 

Ends 
Close 


One 

End 

Close 


Neither 

End 
Close 


Preserved 


High 
Benefit 


Modest 
Benefit 


Low 
Benefit 


Not Preserved 


Low 
Penalty 


Modest 
Penalty 


High 

Penalty 





Fig. 7. w r and wi are not the same. 



tion of shape (as depicted in Fig. 2a) is partly important, 
we do not care that much about it if the edges coincide 
at neither end. For an example of when this looks ra- 
tional, we consider the case when a is "Vehicle", b is "4- 
wheeled", a 1 is "Animal", and b' is "4-legged". Therefore, 
for the category V, we would like the mapping to receive a 
low benefit. The situation is completely similar to that of 
category VI, so, we do not try to justify why a mappings 
of that category will be penalized to a large extent. 

Table 1 summerises the above manifesto about the six 
categories along with our suggested treatment for each 
case. 

3.3. Our Proposed Partial Order 

Adding the fact that the weighting system is expected 
to be symmetric in its arguments, we observed that 
one possible such weighting is the following 5 in which 
vi , V2 £ G: (By being symmetric in its argument, we mean 

w(m(G,G'))=w(m- l (G',G)).) 

w(m) = wo(m) — wi(m) — w r (m), 
where 

w (m)= £ fm(vi)+f m (v 2 ) 

m(m)= £ S m (vi)+g m (v2) 

TP(( V] , V2 ),m) 

W r {m)= £ gm(vi)+g m (v 2 ) 

rP((m(v 1 ),m(v 2 )),m _1 ) 

f m (x)=l/f(d(x,m(x))) 
8m(*) = g(d(x,m(x))) 

The interpretations for wo, w/, and w r are: 

• wo: This part of formula is in charge of accumulating 
the score of coincidence of all the preserved edges 
(under m). 

• w r : This part accumulates the coincidence which is 
missed because of typeless preservation of edges of 
G under m. And, finally 

• w/: is the accumulated loss of coincidence because 
of typeless preservation of edges of G' under mT x . 



5. For a note on how to prevent this formula to approach to infinity, please 
refer to Section 3.4. 



Although w is symmetric in its arguments, w r and w/ are 
not the same. Fig. 7 demonstrates an example where t ^ 
t', and the typeless preservation of the edge e : t £ G gets 
counted only in w\ (and not in w r ). Likewise, the typeless 
preservation of the edge e' : t' £ G' is only counted in w r . 
As a result, the respective mapping receives two penalties 
for these two edges; one for e and another for e'. 

We do not claim any validity for the functions / and g; 
because they are meant to be experimentally tuned. That 
is to say, these functions can be considered as normaliza- 
tion functions. Their common property is being strictly 
increasing. Otherwise, one can always find one of the 
six categories above in which w will misbehave. Further- 
more, / should have another property as well; its range 
should be outside a certain neighborhood around origin. 
For the case when this will result in a misbehavior, con- 
sider a pair of ontologies across which there exists a pair 
of concepts with distance 0. If /(0) is 0, then w will be- 
come +oo ; regardless of the rest of alignment. And, this 
obviously is a significant anomaly because it will cause a 
big class of alignments to look the same - while they are 
not inherently the same. That is, in such a case, w does 
not do much for a large class of alignments. 

In presence of a vertex which does not get mapped to 
anything, all the edges from that vertex - or to it - are not 
preserved. In these cases, the alignment should get more 
weight than one which has mapped such edges to edges 
with wrong types. To tune our formula to reflect this, vir- 
tually consider it being mapped to an imaginary vertex, 
existence of which does not give us any information. In 
this case, its distance ought to be from any other con- 
cept. One can easily verify it that the above weight satis- 
fies all the conditions enumerated. As a further benefit of 
our proposed weighting method, we would like to notify 
it further that, for cases like that of Fig. 7, our weight- 
ing method would penalize m twice; once because e is not 
preserved, and another time for e'. 

The special case where this will become more interest- 
ing even is when e : subClassOf, and e' : subClassOf. 
Here, our weight will recognize the fact that an alignment 
which maps e endpoints to two concepts between which 
there is no edge at all, is better than when they get mapped 
to a pair of edges where there is an edge between them 
with an inverse type. 
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3.4. Commentary 

As said before, there are cases in which what the input 
matrix gives us may not be a metric space. In fact, as said 
in Section 2.1, a metric space is needed to have symmetry. 
However, as listed in [5], there are schema-based match- 
ing techniques which use linguistics resources. These 
techniques may not convince this property. That is, for ex- 
ample: In the Webster Collegiate Dictionary [35], "quick" 
is in the 12 th place in the list of synonyms of "swift", 
while "swift" is second in the list of synonyms of "quick". 
In such a case, the symmetry property may not hold. 
Therefore, what we get may be a Quasi-Metric Space [36] 
rather than a metric space. However, as [3] also mentions, 
only few authors may consider similarity metrics which 
do not have symmetry. So, the existing weighting formula 
and the assumption with it will almost always be convinc- 
ing. Even in case where one is faced with an application in 
which there inherently exists no symmetry, a little tweak 
to the formula will give rise to a symmetric weighting 
formula which still convinces all the conditions listed in 
section 3.3: 

w'(m) = wo(m) — wi(m) — w' r (m) 

where wo(m) and w/(m) remain the same, but 

w' r {m)= £ Im-l( m M)+I m -lWv2)) 

TP({m{v\) 1 m{v2))jn ) 

Furthermore, there seems no way to guarantee that the 
triangular inequality holds for any output of the phase 1 . 
Despite that, it seems quite reasonable to assume that this 
property holds for any such guess. In fact, we believe 
finding a real guess in which this does not hold is unlikely. 

Another question which may arise is about complexity. 
It can easily be shown that naively using these formulas 
needs an exhaustive search; finding the best mapping di- 
rectly is not known to be 2? or JV 2? . Suppose on the 
contrary that it is efficient. Then, one can come to an ef- 
ficient way for solving the graph isomorphism problem; 
given a pair of (un-typed) graphs (not embedded in a met- 
ric space), assign a fixed type t to all of the edges, embed 
them in a metric space in which the distance of any pair 
of points is 1 , and run our algorithm on them - in an ef- 
ficient time. The heaviest matchings can be efficiently 
checked for being an isomorphism, because one can re- 
move the types and the metric space backbone. It is easy 
to verify that there is a homeomorphism between the orig- 
inal graphs iff the correspondence with the biggest weight 
is a isomorphism between them. This will give us an ef- 
ficient way of solving the graph isomorphism problem. 
This means that we now know that this latter problem is 
2? - which of course we do not. 

So far, we assume that for considering all the possible 
matchings, one iterates through alignments until making 
sure that they are finished. This means the algorithm iter- 
ates exponential times. Nevertheless, considering all the 
possible matchings is not needed. As Papadimitriou and 
Steigiltz show in [8], there exist heuristics for dealing with 
this in a £? time. For the moment, however, we do not 



consider those heuristics. Despite that, we are not about 
to leave this problem in its general form; We believe that 
the OM-specific heuristics presented 4.1 can decrease the 
runtime. However, for fully supporting the exponential 
nature of exhaustive search, one needs more elaborated 
approaches such as the ones offered in sections 4.3 and 
4.4. 



4. Alignment Selection 

In this section, we explain about three possible ways 
to use the explained solution for the alignment selection 
(also referred to as mapping extraction). First, we explain 
about some heuristics for decreasing the runtime. Based 
on that, a trivial approach is explained. Next, trying to 
come up with more elaborated solutions, a GA approach 
is introduced. This section is then finalized by an approx- 
imative approach. 

4.1. Heuristics for Decreasing the Runtime 

All the heuristics presented here are based on the types 
of edges. The following list shows the whole idea: (Let 
us call this list the recipes for discard and contraction^) 
In this list, for the first and third item, we change the ini- 
tial graph via contraction along its certain parts, then ap- 
ply our refinement method to the resulting reduced graph, 
and finally transform the graph back to what it has origi- 
nally been. Having this done, we consider completing the 
proposed mappings by moving back to consideration the 
neglected parts during the period when the graph was in 
its contracted form. We will call this restoration of con- 
tracted vertices the expansion phase. 

• IS-A (rdfs: subClassOf): Contract all the paths into a 
pair of vertices between which there is an edge of 
type IS-A. The source of this edge will be the source 
of the original path, while the destination will be 
a new vertex, similarity of which is the maximum 
of the similarities of the original path excluding the 
source. At the expansion phase, consider this prob- 
lem as an independent matching problem, but with 
the explanation after this list. 

• Disjoint (owl:disjointWith): If the difference be- 
tween the distances of a concept in one ontology 
from a couple of disjoint concepts in another is above 
a certain threshold, remove the possibility of map- 
ping the first concept to the one in the couple which 
is farther. 

• Equivalence (owhequivalentClass): Contract all 
such vertices into one representing the whole group. 
Assign the maximum similarity of group to this new 
node. On expansion, there is no difference be- 
tween different choices for matching between the 
two graphs. 

• owhfunctionalProperty: Functional properties 
should be mapped to functional properties, so, 
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Table 2. Frequency of OWL (and RDF) properties. 



© 

(I) (II) 

Fig. 8. Notes on expansion phase of IS-A. 



discard all the alignments for which this does not 
hold. 

• rdfs: domain: If there are two properties across the 
ontologies which domain over disjoint classes, dis- 
card all the alignments which map them to each 
other. Here, the "disjoint"-ness may be understood 
from several indicatives. For example, their distance 
may be more than a certain threshold. As an example 
of where an inference might also be involved, con- 
sider the question of mapping pi e Oi and p2 6 O-i 
to each other where p\ and pi domain over C\ and 
C2 respectively, and where C\ is owhdisjointWith C' 
while d{C\,Ci) > M (M being a certain threshold). 

• owhintersectionOf: Discard all the alignments that 
map classes which are intersections of disjoint 
classes. For instance, if 0\ 3 C1 = n"Ci,- and 
O2 3 C2 = H™C2j, and we know that for some i e 
{ 1 , ..., n} and some j e {1, . . . , m}, Cu and C2J are 
disjoint, we should be discarding all the alignments 
which map C\ and C2 to each other. ("disjoint"-ness, 
here, is meant to be what described for rdfs: domain!) 

As far as the authors understand, all of the above heuris- 
tics should immediately seem rational except the first one. 
To have an intuition on the contraction, one can consider 
it like Query Expansion in the Information Retrieval [37] 
terminology. The expansion however is a little tricky. 
There is a fine observation which should be made on an 
IS-A paths: 

Consider Fig. 8(1), in which after expansion, it is cho- 
sen to map a to a', and b to b' . Here, there remains no 
choice for c. Now, consider Fig. 8(11), in which a is 
mapped to b' . Note that because b IS-A(n) a, and b' IS- 
A(n) a', it is not correct to map b to a', and there remains 
no choice for either of b and c. With this scheme in mind, 
a solution to the expansion will become trivial, and the 
complexity of which will definitely be too small - say 
0(n)\ However, we leave details of this until section 4.4 
for a related discussion. 

A question which may arise here is that "Why are there 
only a few properties chosen among the set of all OWL 
and RDF ones?" The reason behind this choice is a sur- 
vey we have had on a set of 545 ontologies. Table 2 shows 
the results of this survey (where NoU = Number of Us- 
age, PI+ = Percent of usage with IS-A, PI- = Percent 



Property 


NoU 


PI+ 


PI- 


owl:incompatiblewith 











owl:alldifferent 


13 


0.01 


0.01 


owl:differentfrom 


13 


0.01 


0.01 


rdfs:datatype 


11 





0.01 


owl : symmetricproperty 


27 


0.01 


0.02 


owl:sameas 


43 


0.02 


0.03 


owl : equ ivalentproperty 


70 


0.03 


0.05 


owl:inversefunctionalproperty 


100 


0.04 


0.08 


owl:thing 


233 


0.09 


0.18 


wl : tran sitiveproperty 


266 


0.11 


0.21 


owl:oneof 


313 


0.12 


0.24 


owl:maxcardinality 


807 


0.32 


0.63 


owl:inverseof 


932 


0.37 


0.73 


owl:mincardinality 


1315 


0.52 


1.02 


owl:unionof 


1629 


0.65 


1.27 


owlxardinality 


2416 


0.96 


1.88 


owl:allvaluesfrom 


2841 


1.12 


2.21 


rdfs : subproperty of 


2893 


1.15 


2.25 


owl:equivalentclass 


4836 


1.91 


3.76 


owl:functionalproperty 


7625 


3.02 


5.93 


owl:disjointwith 


7892 


3.12 


6.14 


rdfstdomain 


8476 


3.36 


6.59 


owl:intersectionof 


9482 


3.75 


7.38 


owl:somevaluesfrom 


22874 


9.06 


17.79 


owhrestriction 


53440 


21.16 


41.57 


rdfs:subclassof 


124005 


49.1 


— 


Sum 


252552 


— 


— 


Sum without subclass of 


128547 


— 


— 



of usage without IS-A). The authors believe that, accord- 
ing to that table, the percents of usage for the properties 
above owl: equivalentClass are not acceptable. However, 
we do not provide any heuristics for owl: Restriction and 
owl: someValuesFrom either. 

The reason why we do not offer any heuristics for 
owhRestirction is that it is too general; The user may 
decide to use it for many reasons, yet there will be no 
guaranty that those reasons imply any degree of rele- 
vance for the properties they are restricting. It might be 
tempting to choose to discard the mappings which map 
restricted properties (the ones which are qualified with 
owhRestirction) to non-restricted ones (the ones that are 
not qualified with owhRestirction). This unfortunately 
will be wrong because whether or not the ontology de- 
cides to restrict a property can well be a mere matter of 
area of interest. For instance, an ontology describing plant 
transportation business may not be interested in the color 
of the plants they transport. On the other hand, a decora- 
tion company which is a customer of plant transportation 
seems to have such an interest. They obviously have com- 
mon objects of interest (plants), but the latter chooses to 
restrict that object in their ontology while the former does 
not. And, discarding the mappings which map Plant to 
Plant across the ontologies is a mistake. 

We also give no heuristics for owl: someValuesFrom. 
This policy takes place because we believe the W3C's 
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Algorithm 1. Naive approach. 



1: Input O and O 1 . 

2: Apply a Threshold-Based Refinement on O and O' . 

3: Apply the recipes for Discard and Contraction on 
O, O 1 ; 

call the resulting ontologies 0\ and 0\, respec- 
tively. 

4: Weight all remained possible mappings from 0\ to 

5: Expand back the contracted parts of 0\ and O l 1 . 
6: Output the mappings along with their weights. 




Table 3. Distance of nodes before and after of contraction. 



- 


n 





P 


b 


0.9 


0.1 


0.4 


c 


0.6 


0.7 


0.1 


d 


0.4 


0.5 


0.6 


e 


0.4 


0.5 


0.4 



- 


(P,n) 


(o) 


(b,d) 


0.9 


0.5 


(b,e) 


0.9 


0.5 


(c) 


0.6 


0.7 



Table 4. The example after contraction. 





First pair 


Second pair 


w 


Wl 


w r 


Score 


1 


(b,d), (p,n) 


(M» 





1.4 


1.4 


-2.8 


2 


(b,e), (p,n) 


(M), (o) 





1.4 


1.4 


-2.8 


3 


(b,d), (p,n) 


(c), (o) 





1.6 


1.6 


-3.2 


4 


( C )> (P: n ) 


(b,d), (o) 


n.fi ^ n.7 








3 


5 


(b,e), (p,n) 


Wh (o) 





1.0 


1.0 


-2.0 


6 


( C )> (P: n ) 


(b,e),(o) 


0.6 T 0.7 








3 



Fig. 9. The example, I- before, II- after contraction, Ill- 
final mapping. 



description for this property restriction specifier [38] is 
rather tricky. Correctly understanding it will therefore 
need a fairly good understanding of Mathematical Logic, 
assumption of which for every ontology does not seem to 
be very realistic. The problem becomes more sever when 
one realizes that the mere phrase "some values from" does 
not inherently indicate any necessity for having the class 
it is describing to have the particular property that W3C 
describes. 

4.2. Naive Approach 

Algorithm 1 is a pseudo-code for mapping extraction 
based on the mapping scoring mechanism explained be- 
fore. As an example of how this works, we consider 
Fig. 9. 

Figure 9(1) shows the two ontologies O and O 1 . In ac- 
cordance to this, Table 3-left shows d - the distance be- 
tween concepts across them. Performing the third step of 
the Algorithm 1 will result in Fig. 9(11) and change of 
d accordingly as shown in Table 3-right. To clarify the 
values in the table we explain how the distance between 
(b,d) and (p,n) is calculated. According to Table 3-left, 
the distance between b and n is 0.9, similarly for b and 
p is 0.4, for d and n is 0.4 and finally for d and p is 0.6. 
Considering the Maximum of such values we reach to 0.9 
for the distance between (b 7 d) and (p,n). Other values 
are calculated similarly. 

Choosing f{x) = x + 0.1 and g{x) = x, Table 4 will 
be the outcome of step 4. We explain how the values in 
row 1 are obtained. Since the edge between (b,e) and 
(b,d) is of type y for the edge between (o) and (p,n) is 
of type x (i.e. the edge is not preserved) therefore wo = 0. 
On the other hands w/ = g((b,d),(p,n)) + g((b,e),(o)) 
which is equal to d((b,d),(p,n)) + d({b,e),(o)) so we 
have wi = 0.9 + 0.5 = 1.4. w r is computed similarly. 



In row 4 of the table we have a case where the edge 
is preserved. Therefore wq = l/(d((c) 7 (p,n)) + 0.1) + 
l/(d((b,d), (o)) + 0.1) which is equal to 1/0.6 + 1/0.7. 
As it is resulted from the table either of mappings 4 or 6 
can be chosen as an ideal. This means that the problem is 
now reduced to two simpler subproblems: In the first, one 
should decide on mapping either of p and n to c, and, in 
the second, on choosing between b and d to be mapped to 
o. Considering the individual distances between vertices, 
one can easily choose to map b to o, and c to p. The 
extracted mapping, therefore, will be what is depicted in 
Fig. 9(111). 



4.3. GA-Based Mapping Extraction 

One way to overcome the complexity of the naive ap- 
proach is to treat the problem as one of optimization and 
then benefit from different approaches in that realm. Here, 
we briefly report the result of applying a GA approach to 
this problem, as detailed in [39]. 

As for similar GA solutions, we require a fitness func- 
tion to evaluate each individual in our population, so we 
choose the coincidence-based weight function (Section. 
3.3). We should define normalization functions and a dis- 
tance metric to get a clear solution for fitness and indi- 
vidual evaluation. This distance function may either be 
a string-based distance or any other one. The distance 
between entities e, and ej, (i.e. 5(ej,ej)), is considered 
to be the Levenshtein distance [40] of their labels. Nor- 
malization functions, / and g are then defined. / should 
be a positive decreasing function in for d(v,m(v)), so if 
d(v,m(v)) grows, it decreases to reduce the positive point. 
g should be a positive increasing function to grow with the 
growth of d(v,m(v)) to increase the negative point for that 
match. Normalization functions are defined by tuning the 
system. 



/(v) 
*(v) = 



J ,5(v,m(v)) 



1 



£,max(5,15— 5(v,m(v))) 
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Table 5. Head-to-head comparison EON 2004 tests between 
the competitors and GA. 



Test 


GA 


Kar 


UM 


FUJI 


Stan 


201 


0.40 


0.43 


0.44 


0.98 


1.00 


202 


0.38 


n/a 


0.38 


0.95 


1.00 


204 


0.74 


0.62 


0.55 


0.95 


0.99 


205 


0.48 


0.47 


0.49 


0.79 


0.95 


206 


0.67 


0.48 


0.46 


0.85 


1.00 


221 


(*) 


n/a 


0.61 


0.98 


0.99 


222 


0.74 


n/a 


0.55 


0.99 


0.98 


223 


0.79 


0.59 


0.59 


0.95 


0.95 


224 


1.00 


0.97 


0.97 


0.99 


0.99 


225 


0.98 


n/a 


0.59 


0.99 


0.99 


228 


(*) 


n/a 


0.38 


0.91 


1.00 


230 


0.85 


0.60 


0.46 


0.97 


0.99 


301 


0.85 


0.85 


0.49 


0.89 


0.93 


302 


0.83 


1.00 


0.23 


0.39 


0.94 


303 


0.68 


0.85 


0.31 


0.51 


0.85 


304 


0.85 


0.91 


0.44 


0.85 


0.97 



Table 6. EON 2004 competitors vs GA. 


Test 


GA 


Kar 


UM 


FUJI 


Stan 


2xx 


0.70 


0.59 


0.54 


0.94 


0.99 


3xx 


0.82 


0.90 


0.37 


0.66 


0.92 


total 


0.73 


0.71 


0.48 


0.89 


0.97 



These functions actually satisfy characteristics expected 
from f,g explained above. / is a decreasing function and 
decreases with the growth of 5 and g increases. Exponen- 
tial functions are chosen for f,g so that f,g have close 
comparable values. These functions reflect discussions 
on positive and negative points for different categories of 
a coincidence -based weight. 

In summary, fitness function w(m) of section 3.3 is as fol- 
lows: 

f(x) = g- 5 (*> m M) 

g(x) = g- max ( 5 > 15 - (5 ( ;t > m W)) 

8(x,m(x)) = LD(label(x) ,label(m(x)) 

(LD = Levenshtein distance.) The next step is to de- 
sign a crossover function to produce offspring - a new 
alignment - from two parents - two alignments. In the 
crossover function, single nodes are compared based on 
their weight. As described in [39], the weight of a sin- 
gle node in an alignment is the sum of weights of pairs in 
which that node is included. The best pairs among parents 
are chosen to be present in offspring. 

Our first experiment with GA resulted in precision [37] 
of 0.7 when the two ontologies differ and 1 when they 
are the same. We conducted an experiment on a pair of 
Tourist ontologies [41] with population of 1,000 individ- 
uals, and the genetic algorithm converged in 32 iterations. 

For our second experiment, we chose the EON 2004 
[42] dataset, which contains tests for benchmarking the 
merit of OA algorithms [43]. We did not use the Ixx series 
because it was overly simple. Table 5 shows the precision 
of this approach compared to that of the competitors of 
EON 2004 as reported in [44]. 

In Tables 5 and 6, Kar, UM, FUJI, and Stan stand 
for karlsruhe2, umontreal, fujitsu, and Stanford teams. 
Cells marked with an asterisk indicate tests not applica- 



ble to coincidence-based approaches in general. Ref. [43] 
reports that for 221, "all subclass assertions to named 
classes are suppressed." For 228, "properties and relations 
between objects have been completely suppressed." GA 
outperforms karlsruhe2 and umontreal teams in 2xx tests 
while karlsruhe2 outperforms GA in the 3xx tests. In these 
- which according to [43] are the real ontologies - GA 
outperforms fujitsu. 

Table 6 summarizes the comparison. Ref. [44] summa- 
rizes EON 2004 as follows: 

In this test, there are clear winners it seems 
that the results provided by Stanford and Fu- 
jitsu/Tokyo outperform those provided by Karl- 
sruhe and Montreal/INRIA. 

In fact, it can be considered that these constitute 
two groups of programs. The Stanford+Fujitsu 
programs are very different but strongly based 
on the labels attached to entities. For that rea- 
son they performed especially well when labels 
were preserved (i.e., most of the time). The 
Karlsruhe+INRIA systems tend to rely on many 
different features and thus to balance the influ- 
ence of individual features, so they tend to re- 
duce the fact that labels were preserved. 

Given that the concern of coincidence -based ap- 
proaches, is generally not the mere labels attached to the 
entities, one can hardly say that they strongly rely on that. 
(One may argue that the types of the graphs are defined 
based upon the labels of the graphs. And, that is a cor- 
rect observations. As explained throughout this paper, 
however, much more than labels, the coincidence -based 
approaches are mainly concerned with how typed graphs 
coincide.) We thus note that GA outperforms both Kar 
and UM. 

GA approaches generally try to find near optimal solu- 
tions and not necessarily the global optimum. Because the 
run-time complexity of the naive approach limits its use to 
small ontologies, we have to rely on approximations such 
as those GA yields. Ref. [39] details this approach and ex- 
plains how to keep the algorithm from falling into a local 
optima. 

4.4. Approximative Approaches 

Consider the idea of forming a new graph; a bipartite 
graph G(V[,V2,E) where V[ and V2 are the sets of con- 
cepts of 0[ and O2, respectively. An edge e £ E is not 
typed but is weighted. This weight will show the mutual 
distance between its source and target (which is calculated 
in phase I). Applying a Maximum Weight Matching [27] 
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for matching extraction here would be quite unwise be- 
cause one would definitely lose the inherent structure and 
interrelationships of both ontologies. That is, regardless 
of the internal structure of the ontologies, the choice of an 
edge via this method would merely help to an overall opti- 
mization of the mutual distances between the concepts. In 
other words, there is no estimate of how much the resulted 
matching will also preserve the structure. 

On the other hand, as described in Section 3.1, a 
method which merely considers structure is not precise 
enough either. As also explained in the same section, the 
notion of coincidence as a measure for knowing how co- 
incident the two ontologies - as a whole - are becomes 
helpful. We explained further in Section 3.1 that the no- 
tion of coincidence goes hand-in-hand with alignments. 
Unfortunately, however, as discussed in Section 3.4, there 
is no knowledge at the moment about the complexity of 
the problem (of Typed Graph Isomorphism). We only 
know that it is as complex at least as the Graph Isomor- 
phism problem itself. 

The Naive Approach as offered in Section 4.2 is also 
already exponential. In Section 4.3, we tried to allevi- 
ate this complexity by exploiting GA which is no longer 
complex. However, knowing that excessive search is too 
complex and impractical, this section is about to offer a 
straight approach for coming up with a best-coinciding 
alignment. The question will then be: "Is there any ap- 
proachable way for straightly finding an alignment which 
- although may not be the best coinciding - is close to 
that?" This is the question which the approaches we call 
Approximative will try to answer. 

For finding the best-coinciding, the Naive Approach (in 
Section 4.2) tried to examine all the possible choices - 
which of course is an overkill. To find a close-to-best- 
coinciding, however, it will be nice if we can first have 
an estimate of how much pairing some arbitrary nodes 
may help in coming up with a better measurement for the 
degree of coincidence of the two ontologies. Once we 
have these pairwise estimates, we should next apply some 
minimization algorithms to minimize the overall distance 
too. This, as summerised in Algorithm 2, is in fact the 
sketch of our approximative approach. 

In this section, only a quick discussion on how to ap- 
ply the technique is presented. Especially, if the reader 
is interested to know why for the sake of minimizing the 
overall distance we do not use Maximum Weigh Match- 
ing [27], we suggest consideration of ibid. For short, 
we choose Maximum Weight Non-crossing Matching [45] 
over the former algorithm because the former may pro- 
duce results which are conceptually wrong. As an exam- 
ple for Fig. 8, it may choose to map a to b' and in the same 
time c to a 1 - which, as discussed in section 4.1, will be 
wrong. 

It is natural to ask here: "How is the step 2 of Algo- 
rithm 2 done? And, where is the use of random walks?" 
Algorithm 3 is the way the weights for edges of E get 
calculated. We believe that this should answer both the 
above questions. There, considering s to be a randomly 
generated stem, w s (m(s)) = w s +(m(s)) —w s -(m(s)), for 



Algorithm 2. Sketch of the random walk approximation. 

1: Input Ontologies Oi(V\,E{) and 02(V2,E 2 ) along 
with the metric space (X,d) in which they are em- 
bedded. 

2: Construct a bipartite graph G(V[,V2,E) with 
weighted edges; each edge shows the helpfulness of 
pairing the endpoints of the edge for the two ontolo- 
gies 0\ and 02 to look more coinciding on (X,d). 

3: Apply Maximum Non-crossing Weight Matching 
toG. 

4: Output the resulting heaviest matching. 



Algorithm 3. Calculation of weights of E. 



for all v G V\ U V2 do {Initialisation} 



find the typical edge met along all v-stems. 
end for 

for all vi G V[ do {Evaluation} 
for all V2 G V2 do 

Generate a random vi-stem s\ m , calculate 
w Sl (m(si)) 
7: Generate a random V2-stem S2', calculate 

w S2 (m(s 2 )) 

e(vi,V2)^w Sl (m(si))+w S2 (m(s 2 )) 
end for 
end for 



which: 



-■iS-A^ 



w s +(m(s))= £ ("j — T( 

, , s VI — dlu,u 

u f =m(u) 

Pm{u,.) 

w s -(m(.))= £ d(u,u')-° SM 

u f Gm(s) 
u'=m(u) 

TP m (u,.) 

Here, for each vi and V2, if s is a vi-stem, m(s) is the 
V2-stem chosen to be best coinciding with s. And, finally, 
explanation of the symbols: 

• In each iteration over the summation, u and u' are 
the current vertices. By u' = m(u), we mean that u is 
from the pattern graph (0[), and, is matched with u' 
from the target one (Oi)- 

• d(.,.) is a function returning the distance between 
the concepts it is input with. Note that this distance 
is the metric of our metric space in fact. 

• Q s (., .) is a function which takes two vertices of a 
graph, and returns the number of edges to be met 
along s for reaching to the second from the first. Note 
that this is independent of our metric space. In fact, 
this is applicable to any graph, yet, it is different from 
the common method of defining distance of vertices 
in graphs [46]. 
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(.,.) is the number of IS-A relationships met 
from the first argument toward the second when 
traversing on s. 

It is worth mentioning that 0(m',v') — D is _ a (m / ,v / ) is 
always equal to 0(m, v). However, we prefer to retain 
the former because it better shows our purpose. Of 
course, for efficiency purposes, in practice, one may 
choose to use the latter over the former. 



5. Conclusions and Future Works 

We can summarize the novelty of this work as follows: 
First of all, the "coincidence" factor is something intro- 
duced for the first time in [47] by three authors of the cur- 
rent work. Secondly, up to our knowledge, the work re- 
ported in this paper is the first general formulation of the 
mapping extraction problem. Thirdly, other works either 
leave it completely to the user to extract the mappings, or 
do it in cooperation with the user, or do simple form of 
extraction. 

The main contribution of this paper is to give a formal 
definition of the problem and our solution to respond to 
the problem. The paper also includes some experimental 
results on the GA implementation. The main merit of this 
approach is when it is applied for large ontologies where 
the structural relations play an important role in the align- 
ment selection. For the case of simple ontologies where 
the decision making is mainly based on label similarities, 
other lexical based approaches might perform better. 

We are now extending our research to find other ap- 
proaches to use the coincidence-based weighting. One di- 
rection is to introduce Approximative Algorithms which 
perform a more elaborate random walk. Another direc- 
tion is to consider other works in which Graph Theory 
and Metric Spaces are considered together, and find new 
ideas for further reduction of the size of the basic mecha- 
nism. One can consider [48] for example. Given that it is 
common to use Domain Theory [33] for evaluating the se- 
mantics of programming languages [49], we believe that 
there is a vast room for injecting those ideas in the realm 
of OM, especially in better adjustment of the partial order 
we were speaking about in this paper. 
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